-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replay: do not recheck duplicate confirmation if already confirmed #1237
Conversation
core/src/replay_stage.rs
Outdated
if progress.is_duplicate_confirmed(*slot).unwrap_or(false) { | ||
// Already duplicate confirmed, nothing left to do | ||
continue; | ||
} | ||
|
||
progress.set_duplicate_confirmed_slot(*slot); | ||
assert!(*frozen_hash != Hash::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, would this existing
agave/core/src/replay_stage.rs
Lines 4084 to 4088 in abdfbca
if let Some(prev_hash) = duplicate_confirmed_slots.insert(*slot, *frozen_hash) { | |
assert_eq!(prev_hash, *frozen_hash); | |
// Already processed this signal | |
return; | |
} |
duplicate_confirmed_slots
in confirm_forks
?
Better yet, can we just remove duplicate_confirmed_slots
and replace it with the progress map like you have here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the idea for splitting this up into confirm_forks
and mark_slot_confirmed
was so confirm_forks
is side effect free for unit testing.
And we can't replace duplicate_confirmed_slots
with progress_map
unfortunately because progress_map
only holds replayed forks, however a slot could be duplicate_confirmed
through gossip, so we have to track it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm is there a reason not to have progress map also track gossip duplicate confirmed forks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a try here: AshwinSekar@9d14e0d
The main issue is that ProgressMap
is a wrapper around this HashMap
:
agave/core/src/consensus/progress_map.rs
Lines 258 to 260 in 3ca6719
pub struct ProgressMap { | |
progress_map: HashMap<Slot, ForkProgress>, | |
} |
We insert new entries based on the bank:
agave/core/src/replay_stage.rs
Lines 2932 to 2934 in 3ca6719
let bank_progress = progress.entry(bank.slot()).or_insert_with(|| { | |
ForkProgress::new_from_bank( | |
&bank, |
Because ForkProgress
and it's sub struct ForkStats
are mostly comprised of information we can only get from replay:
agave/core/src/consensus/progress_map.rs
Lines 72 to 85 in 3ca6719
pub struct ForkProgress { | |
pub is_dead: bool, | |
pub fork_stats: ForkStats, | |
pub propagated_stats: PropagatedStats, | |
pub replay_stats: Arc<RwLock<ReplaySlotStats>>, | |
pub replay_progress: Arc<RwLock<ConfirmationProgress>>, | |
pub retransmit_info: RetransmitInfo, | |
// Note `num_blocks_on_fork` and `num_dropped_blocks_on_fork` only | |
// count new blocks replayed since last restart, which won't include | |
// blocks already existing in the ledger/before snapshot at start, | |
// so these stats do not span all of time | |
pub num_blocks_on_fork: u64, | |
pub num_dropped_blocks_on_fork: u64, | |
} |
When we root we use bank forks to retain:
agave/core/src/consensus/progress_map.rs
Lines 395 to 398 in 3ca6719
pub fn handle_new_root(&mut self, bank_forks: &BankForks) { | |
self.progress_map | |
.retain(|k, _| bank_forks.get(*k).is_some()); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really wanted to put it in progress_map
, we would have to put it at the top level something like:
pub struct ProgressMap {
progress_map: HashMap<Slot, ForkProgress>,
duplicate_confirmed_slots: BTreeMap<Slot, Hash>
}
But at that point we're just tracking two separate things in a struct called ProgressMap
which isn't really a progress indicator anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, essentially duplicate_confirmed_slots right now tracks slots that might not be in ProgressMap at all because we haven't replayed the slot, but have detected the slot via gossip
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1237 +/- ##
=========================================
- Coverage 82.1% 82.1% -0.1%
=========================================
Files 886 886
Lines 236439 236499 +60
=========================================
+ Hits 194252 194272 +20
- Misses 42187 42227 +40 |
abdfbca
to
4a142ca
Compare
I've also removed the previous |
if !bank.is_frozen() { | ||
continue; | ||
} | ||
if tower.is_slot_duplicate_confirmed(*slot, voted_stakes, total_stake) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realized the is_slot_duplicate_confirmed
detection uses voted_stakes
which pops off some slots due to the simulated vote. Probably could improve that a bit by having a separate version of voted_stakes
that doesn't simulate the vote
Would also make the
datapoint_info!(
"bank_weight",
("slot", bank_slot, i64),
("fork_stake", stats.fork_stake, i64),
("fork_weight", stats.fork_weight(), f64),
);
log a bit more accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, actually what do you think about using fork choice for DC? Can address in a future PR.
It seems like it should be fine to aggregate DC amongst children forks? It would also save us from the double iteration, rn we iterate over all the newly frozen banks to grab a voted stakes from each and then iterate over all unrooted slot to see if any voted stakes DC's the unrooted slot.
Instead we could just iterate once over all unrooted slots and check fork choice to see if it's DC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only downside of using fork choice is it only tracks the latest vote, so if people jump around I think it can miss DC
core/src/replay_stage.rs
Outdated
if progress.is_duplicate_confirmed(*slot).unwrap_or(false) { | ||
// Already duplicate confirmed, nothing left to do | ||
continue; | ||
} | ||
|
||
progress.set_duplicate_confirmed_slot(*slot); | ||
assert!(*frozen_hash != Hash::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm is there a reason not to have progress map also track gossip duplicate confirmed forks?
4a142ca
to
636a001
Compare
Problem
We perform the replay duplicate confirmed check as long as the slot is not supermajority confirmed.
This can lead to the same slot being checked multiple times, notifying state machine multiple times, and the log / metric being spammed as well.
Summary of Changes
Separately track the duplicate confirmed status and only check slots that are not duplicate confirmed.
Fixes #1236